Automatically Building a Lexicon from Raw Noisy Data in a Closed Domain

نویسندگان

  • Marina Sokolova
  • Stan Szpakowicz
  • Vivi Nastase
چکیده

Natural language that people use in electronic communication is far from perfect, due to the narrow channel. This also applies to electronic negotiation. We analyze characteristics of the language data obtained from electronic negotiation. We introduce a novel procedure for extracting and building a lexicon from raw noisy data. The data belong to a closed domain, which allows us to perform domaindependent word-sense disambiguation. The procedure itself is domain-independent and should work with data from various text collections. We present the results of an application of our procedure to a text corpus collected by an electronic negotiation support system. http: / / interneg.org/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Output-only Modal Analysis of a Beam Via Frequency Domain Decomposition Method Using Noisy Data

The output data from a structure is the building block for output-only modal analysis. The structure response in the output data, however, is usually contaminated with noise. Naturally, the success of output-only methods in determining the modal parameters of a structure depends on noise level. In this paper, the possibility and accuracy of identifying the modal parameters of a simply supported...

متن کامل

Building Domain-Specific Taggers without Annotated (Domain) Data

Part of speech tagging is a fundamental component in many NLP systems. When taggers developed in one domain are used in another domain, the performance can degrade considerably. We present a method for developing taggers for new domains without requiring POS annotated text in the new domain. Our method involves using raw domain text and identifying related words to form a domain specific lexico...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

Bootstrapping Noun Groups Using Closed-Class Elements Only

The identification of noun groups in text is a well researched task and serves as a pre-step for other natural language processing tasks, such as the extraction of keyphrases or technical terms. We present a first version of a noun group chunker that, given an unannotated text corpus, adapts itself to the domain at hand in an unsupervised way. Our approach is inspired by findings from cognitive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004